XML schema clustering with semantic and hierarchical similarity measures
نویسندگان
چکیده
With the growing popularity of XML as the data representation language, collections of XML data have exploded in numbers. The methods are required to manage and discover the useful information from them for improved document handling. We present a schema clustering process by organising heterogeneous XML schemas into groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structure similarity. We support our findings with experiments and analysis.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملA semantic similarity analysis for data mappings between heterogeneous XML schemas
One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema mapping, which is known to be costly and error-prone. Past research on schema mapping has not made full use of semantic information imbedded in the hierarchical structure of the XML schema. In this chapter, we investigate the existing schema mapping approaches and propose an...
متن کاملA novel method for measuring semantic similarity for XML schema matching
Enterprises integration has recently gained great attentions, as never before. The paper deals with an essential activity enabling seamless enterprises integration, that is, a similarity-based schema matching. To this end, we present a supervised approach to measure semantic similarity between XML schema documents, and, more importantly, address a novel approach to augment reliably labeled trai...
متن کاملA Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity
Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. Therefore it is a new challenge for the field of data mining to turn these documents into a more useful information utility. We present a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according ...
متن کاملSchema Conversion Methods between XML and Relational Models
In this chapter, three semantics-based schema conversion methods are presented: 1) CPI converts an XML schema to a relational schema while preserving semantic constraints of the original XML schema, 2) NeT derives a nested structured XML schema from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical, and 3) CoT takes a relatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Knowl.-Based Syst.
دوره 20 شماره
صفحات -
تاریخ انتشار 2007